1 Introduction

This is a submission for assignment Visualization in R using ggplot2 Link to assignment

The content covers:

  • Marathon data set
  • Questions that can be answered using the data set
  • Conclusion

2 The Marathon data set

rows <- nrow(df)
cols <- ncol(df)

datatable(df)
## Warning in instance$preRenderHook(instance): It seems your data is too big
## for client-side DataTables. You may consider server-side processing: https://
## rstudio.github.io/DT/server.html

Data frame has 22 variables/columns and 16433 measurement/rows in marathon data set

3 Some questions appropriate for the data:

  • Which age group participates actively in marathons?
  • How the distribution looks like for finishers (gender wise) ?
  • What was the percentage of exit/disqualification among participants (age wise) ?
  • How reliable is the gun time measurement in long races like marathon ?
  • What are different types of participants (runner/jogger/walker) ?
  • Is there a strong correlation among gender,category,time taken between 1st and 2nd stage , 2nd stage to halfway and halfway to finish ?
  • Among top 10 finishers , How were their ranks over different stages of race ?

3.1 figure below shows count of participants age wise :

it can be said the most of the participants are from young and middle age and the pattern is similar for both gender.

3.2 figure below shows the frequency of time at which participants have finished the race

it can be seen from the below diagram that men have finished the race faster than women, but the overall distribution of data remains the same which can imply that both gender are competing optimally given that number of female participants is around 5607 while men participants is around 10826.

3.3 figure below shows the percentage of disqualification / incompletion of race in various age groups and gender

it can be seen that older women have slightly more disqualification as compared to men of the same age group

3.4 figure below shows a comparision between chip time and gun time.

it can be seen from the plot that gun time is not a reliable rather a ceremonial way to measure finish time , as it deviates a-lot from actual the chip time

3.5 figure below shows type of participants , type can be runner, jogger and walker

based on race finishing time , participants are runner if the finishing time is below 3hrs , jogger if finishing time is between 3hrs and 5hrs and walkers if finishing time is more than 5hrs

3.6 figure below shows correlation between vaiables like gender, category, time taken at various stages of marathon

conclusions can be drawn that, fast finishers at early stages are more likely to have better overall position, which is obvious

## figure below shows positons of top 10 finishers at different stages of marathon

it can be seen that David (2nd position) actually performed better through out the race except for the last stage. He has a high chance of winning any future marathon as his performance consistent

Link to code and files